Test-cost-sensitive attribute reduction
نویسندگان
چکیده
In many data mining and machine learning applications, there are two objectives in the task of classification; one is decreasing the test cost, the other is improving the classification accuracy. Most existing research work focuses on the latter, with attribute reduction serving as an optional pre-processing stage to remove redundant attributes. In this paper, we point out that when tests must be undertaken in parallel, attribute reduction is mandatory in dealing with the former objective. With this in mind, we posit the minimal test cost reduct problem which constitutes a new, but more general, difficulty than the classical reduct problem. We also define three metrics to evaluate the performance of reduction algorithms from a statistical viewpoint. A framework for a heuristic algorithm is proposed to deal with the new problem; specifically, an information gain-based λ-weighted reduction algorithm is designed, where weights are decided by test costs and a non-positive exponent λ, which is the only parameter set by the user. The algorithm is tested with three representative test cost distributions on four UCI (University of California Irvine) datasets. Experimental results show that there is a trade-off while setting λ, and a competition approach can improve the quality of the result significantly. This study suggests potential application areas and new research trends concerning attribute reduction.
منابع مشابه
Rules-based Classification with Limited Cost
In test cost-sensitive decision systems, it is difficulty for us to find an optimal attribute set and construct a quality classifier with limited cost. The minimal test cost-sensitive attribute reduction is proposed to address the former problem. However, it is inevitable to remove some good even better attributes in the minimal test cost-sensitive attribute reduction. As a result, the classifi...
متن کاملMinimal Cost Attribute Reduction through Backtracking
Test costs and misclassification costs are two most important types in cost-sensitive learning. In decision systems with both costs, there is a tradeoff between them while building a classifier. Generally, with more attributes selected and more information available, the test cost increases, and the misclassification cost decreases. We shall deliberately select an attribute subset such that the...
متن کاملCost-sensitive Naïve Bayes Classification of Uncertain Data
Data uncertainty is widespread in real-word applications. It has captured a lot of attention, but little job has been paid to the research of cost sensitive algorithm on uncertain data. The paper proposes a novel cost-sensitive Naïve Bayes algorithm CS-DTU for classifying and predicting uncertain datasets. In the paper, we apply probability and statistics theory on uncertain data model, define ...
متن کاملCost-sensitive Attribute Value Acquisition for Support Vector Machines∗
We consider cost-sensitive attribute value acquisition in classification problems, where missing attribute values in test instances can be acquired at some cost. We examine this problem in the context of the support vector machine, employing a generic, iterative framework that aims to minimize both acquisition and misclassification costs. Under this framework, we propose an attribute value acqu...
متن کاملCost-Sensitive Test Strategies
In medical diagnosis doctors must often determine what medical tests (e.g., X-ray, blood tests) should be ordered for a patient to minimize the total cost of medical tests and misdiagnosis. In this paper, we design cost-sensitive machine learning algorithms to model this learning and diagnosis process. Medical tests are like attributes in machine learning whose values may be obtained at cost (a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 181 شماره
صفحات -
تاریخ انتشار 2011